Goto

Collaborating Authors

 online decision


Online Decision Based Visual Tracking via Reinforcement Learning

Neural Information Processing Systems

A deep visual tracker is typically based on either object detection or template matching while each of them is only suitable for a particular group of scenes. It is straightforward to consider fusing them together to pursue more reliable tracking. However, this is not wise as they follow different tracking principles. Unlike previous fusion-based methods, we propose a novel ensemble framework, named DTNet, with an online decision mechanism for visual tracking based on hierarchical reinforcement learning. The decision mechanism substantiates an intelligent switching strategy where the detection and the template trackers have to compete with each other to conduct tracking within different scenes that they are adept in. Besides, we present a novel detection tracker which avoids the common issue of incorrect proposal. Extensive results show that our DTNet achieves state-of-the-art tracking performance as well as good balance between accuracy and efficiency. The project website is available at https://vsislab.github.io/DTNet/.


Review for NeurIPS paper: Online Decision Based Visual Tracking via Reinforcement Learning

Neural Information Processing Systems

Additional Feedback: --------------- Post Rebuttal ------------------ Comments on the rebuttal: - The authors provide results of fusing existing SOTA trackers with the proposed switching strategy. On all datasets, results are marginally better than selecting the best of the two trackers (0.001 - 0.007 AUC/EAO). This improvement is rather minor, and difficult to put into context since other fusion methods are not compared. There are naive baselines for doing this, for example simply averaging the two bounding boxes. But exactly this topic has attracted substantial research interest over the years: MCCT, LCT, MEEM, [N.


Review for NeurIPS paper: Online Decision Based Visual Tracking via Reinforcement Learning

Neural Information Processing Systems

The initial scores were 3478. In the response, authors provide experiment results fusing SOTA trackers on the larger datasets, compared with SOTA trackers, showing improved performance. Authors also provide ablation study using hand-designed rules. During discussion, R2 was satisfied with the new comparisons with SOTA and larger datasets, but was not convinced that the fusion method was useful since there was no ablation study comparing only fusion methods (while keeping trackers the same). R3 was mostly satisfied with the response, but novelty concern was not addressed fully.


Online Decision Based Visual Tracking via Reinforcement Learning

Neural Information Processing Systems

A deep visual tracker is typically based on either object detection or template matching while each of them is only suitable for a particular group of scenes. It is straightforward to consider fusing them together to pursue more reliable tracking. However, this is not wise as they follow different tracking principles. Unlike previous fusion-based methods, we propose a novel ensemble framework, named DTNet, with an online decision mechanism for visual tracking based on hierarchical reinforcement learning. The decision mechanism substantiates an intelligent switching strategy where the detection and the template trackers have to compete with each other to conduct tracking within different scenes that they are adept in.


Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty Estimation

Li, Yingru, Xu, Jiawei, Luo, Zhi-Quan

arXiv.org Artificial Intelligence

Foundation models often struggle with uncertainty when faced with new situations in online decision-making, necessitating scalable and efficient exploration to resolve this uncertainty. We introduce GPT-HyperAgent, an augmentation of GPT with HyperAgent for uncertainty-aware, scalable exploration in contextual bandits, a fundamental online decision problem involving natural language input. We prove that HyperAgent achieves fast incremental uncertainty estimation with $\tilde{O}(\log T)$ per-step computational complexity over $T$ periods under the linear realizable assumption. Our analysis demonstrates that HyperAgent's regret order matches that of exact Thompson sampling in linear contextual bandits, closing a significant theoretical gap in scalable exploration. Empirical results in real-world contextual bandit tasks, such as automated content moderation with human feedback, validate the practical effectiveness of GPT-HyperAgent for safety-critical decisions. Our code is open-sourced at \url{https://github.com/szrlee/GPT-HyperAgent/}.


Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions

Yang, Hui, Yue, Sifu, He, Yunzhong

arXiv.org Artificial Intelligence

Auto-GPT is an autonomous agent that leverages recent advancements in adapting Large Language Models (LLMs) for decision-making tasks. While there has been a growing interest in Auto-GPT stypled agents, questions remain regarding the effectiveness and flexibility of Auto-GPT in solving real-world decision-making tasks. Its limited capability for real-world engagement and the absence of benchmarks contribute to these uncertainties. In this paper, we present a comprehensive benchmark study of Auto-GPT styled agents in decision-making tasks that simulate real-world scenarios. Our aim is to gain deeper insights into this problem and understand the adaptability of GPT-based agents. We compare the performance of popular LLMs such as GPT-4, GPT-3.5, Claude, and Vicuna in Auto-GPT styled decision-making tasks. Furthermore, we introduce the Additional Opinions algorithm, an easy and effective method that incorporates supervised/imitation-based learners into the Auto-GPT scheme. This approach enables lightweight supervised learning without requiring fine-tuning of the foundational LLMs. We demonstrate through careful baseline comparisons and ablation studies that the Additional Opinions algorithm significantly enhances performance in online decision-making benchmarks, including WebShop and ALFWorld.


Online Decision Making for Trading Wind Energy

Muñoz, Miguel Angel, Pinson, Pierre, Kazempour, Jalal

arXiv.org Artificial Intelligence

We propose and develop a new algorithm for trading wind energy in electricity markets, within an online learning and optimization framework. In particular, we combine a component-wise adaptive variant of the gradient descent algorithm with recent advances in the feature-driven newsvendor model. This results in an online offering approach capable of leveraging data-rich environments, while adapting to the nonstationary characteristics of energy generation and electricity markets, also with a minimal computational burden. The performance of our approach is analyzed based on several numerical experiments, showing both better adaptability to nonstationary uncertain parameters and significant economic gains.


Online Optimization and Learning in Uncertain Dynamical Environments with Performance Guarantees

Li, Dan, Fooladivanda, Dariush, Martinez, Sonia

arXiv.org Machine Learning

We propose a new framework to solve online optimization and learning problems in unknown and uncertain dynamical environments. This framework enables us to simultaneously learn the uncertain dynamical environment while making online decisions in a quantifiably robust manner. The main technical approach relies on the theory of distributional robust optimization that leverages adaptive probabilistic ambiguity sets. However, as defined, the ambiguity set usually leads to online intractable problems, and the first part of our work is directed to find reformulations in the form of online convex problems for two sub-classes of objective functions. To solve the resulting problems in the proposed framework, we further introduce an online version of the Nesterov accelerated-gradient algorithm. We determine how the proposed solution system achieves a probabilistic regret bound under certain conditions. Two applications illustrate the applicability of the proposed framework.


Integrated Offline and Online Decision Making under Uncertainty

De Filippo, Allegra, Lombardi, Michele, Milano, Michela

Journal of Artificial Intelligence Research

This paper considers multi-stage optimization problems under uncertainty that involve distinct offline and online phases. In particular it addresses the issue of integrating these phases to show how the two are often interrelated in real-world applications. Our methods are applicable under two (fairly general) conditions: 1) the uncertainty is exogenous; 2) it is possible to define a greedy heuristic for the online phase that can be modeled as a parametric convex optimization problem. We start with a baseline composed by a two-stage offline approach paired with the online greedy heuristic. We then propose multiple methods to tighten the offline/online integration, leading to significant quality improvements, at the cost of an increased computation effort either in the offline or the online phase. Overall, our methods provide multiple options to balance the solution quality/time trade-off, suiting a variety of practical application scenarios. To test our methods, we ground our approaches on two real cases studies with both offline and online decisions: an energy management problem with uncertain renewable generation and demand, and a vehicle routing problem with uncertain travel times. The application domains feature respectively continuous and discrete decisions. An extensive analysis of the experimental results shows that indeed offline/online integration may lead to substantial benefits.


Statistical Inference for Online Decision Making via Stochastic Gradient Descent

Chen, Haoyu, Lu, Wenbin, Song, Rui

arXiv.org Machine Learning

Online decision making aims to learn the optimal decision rule by making personalized decisions and updating the decision rule recursively. It has become easier than before with the help of big data, but new challenges also come along. Since the decision rule should be updated once per step, an offline update which uses all the historical data is inefficient in computation and storage. To this end, we propose a completely online algorithm that can make decisions and update the decision rule online via stochastic gradient descent. It is not only efficient but also supports all kinds of parametric reward models. Focusing on the statistical inference of online decision making, we establish the asymptotic normality of the parameter estimator produced by our algorithm and the online inverse probability weighted value estimator we used to estimate the optimal value. Online plugin estimators for the variance of the parameter and value estimators are also provided and shown to be consistent, so that interval estimation and hypothesis test are possible using our method. The proposed algorithm and theoretical results are tested by simulations and a real data application to news article recommendation.